Efficient Top-k Query Processing Algorithms in Highly Distributed Environments
نویسندگان
چکیده
Efficient top-k query processing in highly distributed environments is a valuable but challenging research topic. This paper focuses on the problem over vertically partitioned data and aims to propose more efficient algorithms.. The effort is put on limiting the data transferred and communication round trips among nodes to reduce the communication cost of the query processing. Two novel algorithms, BulkDBPA and 4RUT, are proposed. BulkDBPA is derived from the centralized algorithm BPA2 which requires very low data access. BulkDBPA borrows the idea of best position from BPA2 and so has the advantage of low data transferred. It further reduces the communication round trips by utilizing bulk read and bulk transfer mechanism. 4RUT is inspired by the algorithm TPUT which only requires three communication round trips to get the exact top-k results. 4RUT improves its top-k lower bound estimate by introducing one additional communication round trip, which can subsequently reduce the data transferred in query processing. Experimental results show that both BulkDBPA and 4RUT require much less data transferred and response time than the competitors including Simple Algorithm and TPUT and each has its own suitable application environments respectively.
منابع مشابه
Efficient top-k processing in large-scaled distributed environments
The rapid development of networking technologies has made it possible to construct a distributed database that involves a huge number of sites. Query processing in such a large-scaled system poses serious challenges beyond the scope of traditional distributed algorithms. In this paper, we propose a new algorithm BRANCA for performing top-k retrieval in these environments. Integrating two orthog...
متن کاملSPARQL Query Optimization on Top of DHTs
We study the problem of SPARQL query optimization on top of distributed hash tables. Existing works on SPARQL query processing in such environments have never been implemented in a real system, or do not utilize any optimization techniques and thus exhibit poor performance. Our goal in this paper is to propose efficient and scalable algorithms for optimizing SPARQL basic graph pattern queries. ...
متن کاملAd-hoc Top-k Query Answering for Data Streams
A top-k query retrieves the k highest scoring tuples from a data set with respect to a scoring function defined on the attributes of a tuple. The efficient evaluation of top-k queries has been an active research topic and many different instantiations of the problem, in a variety of settings, have been studied. However, techniques developed for conventional, centralized or distributed databases...
متن کاملTop-k aggregation queries in large-scale distributed systems
Distributed top-k query processing has become an essential functionality in a large number of emerging application classes like Internet traffic monitoring and Peer-to-Peer Web search. This work addresses efficient algorithms for distributed topk queries in wide-area networks where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers.
متن کاملKLEE: A Framework for Distributed Top-k Query Algorithms
This paper addresses the efficient processing of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We present KLEE, a novel algorithmic framework for distributed top-k queri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JCP
دوره 9 شماره
صفحات -
تاریخ انتشار 2014